Feature Selection using Multiple Streams

نویسندگان

  • Paramveer S. Dhillon
  • Dean P. Foster
  • Lyle H. Ungar
چکیده

Feature selection for supervised learning can be greatly improved by making use of the fact that features often come in classes. For example, in gene expression data, the genes which serve as features may be divided into classes based on their membership in gene families or pathways. When labeling words with senses for word sense disambiguation, features fall into classes including adjacent words, their parts of speech, and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are of different sizes and have different (but unknown) fractions of good features. Experimental results show that our approach provides significant improvement in performance and is computationally less expensive than comparable “batch” methods that do not take advantage of the feature classes and expect all features to be known in advance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Comprehensive causal analysis of occupational accidents’ severity in the chemical industries; A field study based on feature selection and multiple linear regression techniques

Introduction: The causal analysis of occupational accidents’ severity in the chemical industries may improve safety design programs in these industries. This comprehensive study was implemented to analyze the factors affecting occupational accidents’ severity in the chemical industries. Methods and Materials: An analytical study was conducted in 22 chemical industries during 2016-2017. The stu...

متن کامل

Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...

متن کامل

Discovering Highly Informative Feature Sets from Data Streams

How to select interesting feature sets from data streams is a new and important research topic in which there are three major challenges. First of all, instead of discovering features individually and independently, we are interested in comprehensively selecting a subset of features whose joint importance or weight is the highest. Secondly, we are concerned with the problem of selecting feature...

متن کامل

The Health Policy Process in Vietnam: Going Beyond Kingdon’s Multiple Streams Theory; Comment on “Shaping the Health Policy Agenda: The Case of Safe Motherhood Policy in Vietnam”

This commentary reflects upon the article along three broad lines. It reflects on the theoretical choices and omissions, particularly highlighting why it is important to adapt the multiple streams framework (MSF) when applying it in a socio-political context like Vietnam’s. The commentary also reflects upon the analytical threads tackled by Ha et al; for instance, it highlights the opportunitie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010